SeCUHITY  CLASSIFICATION  OF  THIS  PACE  (Whtn  Dim  tntmr.d) 

|  REPORT  DOCUMENtyTION  PAGE 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


i  Ij  16669-9-M 


4.  TITLE  (tnd  Submit) 

Kinds  of  Polyconfidence  Intervals  for  Centers, 
and  Some  Thoughts  on  Identification  and 
Selection  of  Confidence  Procedures  Using 


^  John  W. jTukey 


*.  PERFORMING  ORGANIZATION  NAME  ANO  ADDRESS 

Princeton  University 
Princeton,  NY  08544 


n.  CONTROLLING  OFFICE  NAME  ANO  ADDRESS 

U.  S.  Army  Research  Office  /  I 

Post  Office  Box  12211  ' 

Research  Triangle  Park,  NC  27709  _ 


MONITORING  AGENCY  NAME  A  ADDRESSfff  dff/#r*nl  from  Controlling  Olllco) 


5  TYPE  OF  REPORT  A  PERIOD  COVEREO 

)Techni ca I 


6  PERFORMING  ORG.  REPORT  NUMBER 


S.  CONTRACT  OR  GRANT  NUMBERfa) 


DAAG29-  79  C  0205  ^ 


10  PROGRAM  ELEMENT,  PROJECT,  TASK 
AREA  A  WORK  UNIT  NUMBERS 


1  i '  ? (/ 


r  R 


3.  -NUMBER-OP  PAGES 


IS.  SECURITY  CLASS,  f of  thlo  roport) 


Unclassified 


IS*.  DECL  ASSIFICATION/  DOWNGRADING 
SCHEDULE 


17.  DISTRIBUTION  STATEMENT  ( ol  (A*  •Attract  ontorod  In  Block  20,  II  dlllmront  from  Roporl) 


I  IB.  supplementary  NOTES 


The  view,  opinions,  and/or  findings  contained  in  this  report  are  those  of  the 
author(s)  and  should  not  be  construed  as  an  official  Department  of  the  Army 
position,  policy,  or  decision,  unless  so  designated  by  other  documentation. 


I  If,  KEY  WORDS  (Continue  on  rartraa  cl  do  II  nocoooory  and  tdanfitr  oy  otocm  numoarj 


20>  ABSTRACT  fCaaCteu*  om  rorormo  at*  If  noooooofy  aad  Idontltr  A y  block  mmbor) 

”*1 f  we  deal  initially  only  with  single  situations,  monoconfidence  intervals  can  be 
either  required  only  to  be  weak,  to  have  proper  confidence  when  averaged  over  all 
configurations,  or  to  be  strong,  when  they  must  have  the  appropriate  properties 
conditionally  for  each  configuration.  Assuming  location-^nd-scale  invariance 
throughout  both  for  situations  and  confidence  intervals. 4  When  one  considers  a 
plot  involving  several  situations,  three  kinds  of  polyconfidence  intervals  seem 
worth  mention:  (a)  doubly-strong,  whose  properties  hold  separately  for  each 


COITION  OF  *  MOV  «S  IS  OBSOLETE 


unclassified  / _ 


K  *7 


.  unciassi  i  leg _ _ 

SECURITY  CLASSIFICATION  OF  THIS  PAGEfWhan  Dmts  Enffd) 


ABSTRACT  CONTINUED 


combination  of  a  situation  and  a  configuration,  (b)  weak,  whose  properties 
hold  on  average  for  each  situation,  (c)  singly  strong,  where  the  properties 
hold,  first  on  average  for  each  situation  and,  second,  for  each  configuration 
and  one  situation.  These  notions,  as  well  as  that  of  balance  in  a 
conservative  sense,  are  explored  in  the  framework  of  configurations... 


A  crensfof'  For _ 

FTIS  lr 

rsTic  1 1 

U-tr.n>vy.\r)cec!  ^ 

j  untii  lent  lor. - - — 1 


Distribution/ 

Avail'  ’  "  Uty  Codes 

-  .A-.o’.l  r.rd/or 


Unciassi 


SECURITY  CLASSIFICATION  OF  THIS  NAGEflWim  Dmtm  Enfnd) 


i 


Kinds  of  Polyconfidence  Intervals  for  Centers,  and 
Some  Thoughts  on  Identification  and  Selection  of 
Confidence  Procedures  Using  Configural 
Polysampl ing* 

by 


John  W.  Tukey 
Princeton  University 
Princeton,  New  Jersey  08544 
and 

Bell  Laboratories 
Murray  Hill,  New  Jersey  07974 


1 

l 


Technical  Report  No.  190,  Series  2 
Department  of  Statistics 
Princeton  University 
April  1981 


♦Prepared  in  part  in  connection  with  research 
at  Princeton  University,  supported  by  the 
Army  Research  Office  (Durham),  and  in  part  in 
connection  with  research  at  Bell  Telephone 
Laboratories . 


i 


u 


ABSTRACT 


If  we  deal  initially  only  with  single  situa¬ 
tions,  monoconfidence  intervals  can  be  either 
required  only  to  be  weak,  to  have  proper  confi¬ 
dence  when  averaged  over  all  configurations,  or  to 
be  strong,  when  they  must  have  the  appropriate 
properties  conditionally  for  each  configuration. 
(We  assume  location-and-scale  invariance 
throughout  both  for  situations  and  confidence 
intervals.)  When  we  consider  a  plot  involving 
several  situations,  three  kinds  of  polyconfidence 
intervals  seem  worth  mention:  (a)  doubly-strong , 
whose  properties  hold  separately  for  each  combina¬ 
tion  of  a  situation  and  a  configuration,  (b)  weak, 
whose  properties  hold  on  average  for  each  situa¬ 
tion,  (c)  singly  strong,  where  the  properties 
hold,  first  on  average  for  each  situation  and, 
second,  for  each  configuration  and  one  situation. 
These  notions,  as  well  as  that  of  balance  in  a 
conservative  sense,  are  explored  In  the  framework 
of  configurations. 

Attention  then  shifts  to  finding  such 
polyconfidence  intervals,  using  configural 
polysampling*  as  the  principal  tool. 

♦See  Technical  Report  No.  185,  by  Pregibon  and 
Tukey  for  general  background. 


INTRODUCTION 


I  1 

'I 

|  j 

1.  The  one-situation  Framework 

We  begin  with  the  classic  case  of  a  single  location  and 
scale  situation,  whereby 

y =  (yi'y2 . 'V 


is,  for  a  simple  situation,  a  set  of  n  i id  quantities  or, 
for  a  compound  situation,  a  set  of  n  evid  quantities,  whose 
underlying  distributions  are  in  either  case  completely 
specified  except  for  location  and  scale  changes. 

Here  "iid"  stands  for  independently  and  identically 
distributed  and  "evid"  stands  for  "an  exchangeable  version 
of  independently  iistr ibuted" .  The  latter  means  that 

yi,y2 . yn  are  an  unknown,  equiprobable  permutation  of 

z1,z2,...,zn  which  are  independent  with  z 1  distributed 
according  to  fj(z).  The  one  wild  Gaussian  situation  is  a 
classical  example.  (Still  more  complex  sorts  of  situation 
have  not,  as  yet,  been  introduced.) 

We  find  it  somewhere  between  convenient  and  essential 
to  work  with  the  order  statistic  form  of  the  y’s,  which 
could  be  termed  "oiid"  in  the  simple  case  and  "ovid"  in  the 
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compound  case.  So  we  take  y^  <  £  ***  £  yn  as  certain. 

Our  concern  is  with  estimating  a  location  /j,  which  we 
assume  is  defined  as  part  of  each  situation.  We  require 
location-and-scale  invariance,  both  of  the  situation  and  for 
the  estimate,  and  thus  find  it  natural  to  work  with 
location-and-scale  configurations,  conveniently  parameter¬ 
ized  (for  any  choice  of  a  and  b  with  1  <  a  £  b  £  n)  by 

ya  =  r 

yb  -  r+s 

y.  *  r+scj  for  all  other  i 

Here  in  view  of  the  ordering,  we  must  have 

c.  <  0  all  i  <  a 

0  <  Cj  <  1  all  i  with  a  <  i  <  b 

1  <  c.  all  i  >  b 

When  doing  numerical  calculations  it  seems  desirable  to  take 
a  near  n/4  and  b  near  3n/4. 

We  call  the  (n-2)-component  vector  *  c.  c,  ...(), 

... () ,cn,  the  configuration  (more  precisely  but  rarely  the 
( a ,b) -locat ion-and-scale-conf iguration)  .  Here  the  O's  rem¬ 
ind  us,  this  once  only,  of  the  omission  of  coordinates  for 
i  *  a  and  i  «  b. 

The  configuration,  and  the  two  anchors,  yg  and  yfa, 
together 
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t  determine  uniquely,  and  are  uniquely  determined  by 
the  observed  y's. 

O  are  all  that  is  known  to  us 

O  are  the  basis  on  which  we  are  to  construct  estimates, 
confidence  intervals,  etc.  Because  of  the  invariance 
requirement,  we  can  only  use  the  anchors  yg  and  yfa 
linearly.  That  is,  only  numbers  of  the  form 

*  t(*b  -  v 

can  be  used  for  estimates,  confidence  interval  end 
points,  and  the  like,  where  t  will  usually  be  a  func¬ 
tion  of  the  configuration  c. 

2.  Monoconfidence  Intervals 

In  more  graphic  terms,  we  have  two  anchors,  yg  and  y^, 
and  a  picture  of  the  configuration  including  o  where  c 

u 

would  otherwise  be  and  1  where  c.  would  otherwise  be.  The 

b 

configuration  is  pure  shape;  the  anchors  are  for  location 

and  scale.  We  look  at  the  picture  and  decide  where  in  the 

picture  we  wish  to  put  a  point  (an  estimate,  a  confidence 

interval  endpoint,  etc.)  If  this  is  to  be  at  t  in  the  c- 

pictures  the  corresponding  y-like  value  is  y  +  t(y.  -  y  ) . 

Our  requirement  of  location-and-scale  invariance  does  not 

allow  us  to  even  think  about  the  actual  values  of  y  and  y. 

a  b 

while  choosing  t. 

A  pair  of  functions  of  configuration  L(c),  U(c)  define 
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an  [exact,  conservative]  p%  monoconfidence  interval  for  /j 
for  a  given  situation  Q,  if 

ProbQ{r  +  L{c)s  <  »  <  r  +  U(c)s} 


is  [equal  to,  greater  than  or  equal  to]  p/100.  Here  Prob^t] 
applies  equally  to  each  and  every  set  of  underlying  distri¬ 
butions  in  the  situation  Q.  (These  differ  only  by 
location-and-scale  changes.  If  equality  or  inequality  holds 
for  one  instance  of  the  situation,  it  holds  for  all.) 


We  can  easily  ask  a  little  more.  The  ends,  r  +  sL(c) 
and  r  +  sU(c),  of  our  monoc inf idence  interval  divide  (-00, 
00)  into  three  intervals.  We  may  also  want  some  sort  of 
balance.  The  natural  requirement  is  that  neither  should 
have  an  excessive  chance  of  containing  41,  that  both 


Probgl^j  <  r  +  sL(c)}  < 


100-p 

200 


and 


ProbQ{r  +  sU  (c)  }  < 

When  these  "end  conditions"  hold  we  will  speak  of  a  balanced 
monoconfidence  interval.  Such  a  monoconfidence  interval 
provides  both  two-sided  and  one-sided  confidence  statements. 


We  know  that  there  can  be  many  different  kinds  of  mono¬ 
confidence  intervals,  for  a  given  situation  Q.  Fisher  (  ) 
introduced  the  idea  of  requiring  behavior  of  estimates,  etc. 
to  hold  separately  for  each  recognizable  distinction  among 
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the  data  possibilities  considered.  In  our  set-up,  this 
would  mean  that  our  confidence  should  hold  conditionally 
upon  the  configuration  c,  so  that 

ProbQ{r  +  L  (c)  s  <  m  <  r  +  U(c)s|c} 

is  [equal  to,  greater  than  or  equal  to]  p/100.  We  will  call 
a  monoconfidence  limit  fulfilling  such  a  condition  a  strong 
monoconfidence  interval. 

Such  a  requirement  greatly  reduces  flexibility  in 
choosing  confidence  intervals.  More  precisely: 

*  it  rules  out  many  wasteful  choices  for  monoconfidence 
intervals,  and 

♦  in  special  circumstances  (e.g.  Cox  19..)  keeps  us 
from  making  swaps  of  probability  between  recognizably 
different  bodies  of  data  which  could  lead  to  apparent 
overall  gain.  By  restricting  ourselves  to  strong  mono¬ 
confidence  intervals,  we  are  immune  to  challenges  of 
the  form  "but  look  at  your  configuration,  you  know  what 
happens  under  situation  Q  with  such  configurations". 

As  a  result,  challenges  will  have  to  be  to  the  situa¬ 
tion  itself. 


*  the  con-con  function  * 
For  any  real  t,  the  value  of 
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ProbQU  <  tic}  =  GQc  (t) 

depends  only  on  the  indicated  arguments,  Q  and  c.  As  gen¬ 
erating  a  conditional  confidence  statement  based  on  the 

configuration,  it  is  mnemonic  to  call  G  ()  the  con-con 

yc 

function,  given  Q  and  c. 

Every  strong  monoconfidence  interval  under  situation  Q 
arises  by  satisfying 


Gqc (L (c) )  =<  d 
gQc(U(u))  ->  d  +  ygj 
for  a  suitable  d,  as  is  easy  to  see. 


The  minimal  (i.e.,  not  purely  shortenable)  strong  mono¬ 
confidence  intervals,  given  Q  and  c  arise  from  equality  in 
these  two  inequalities.  Moreover,  there  is  a  minimal  bal¬ 
anced  strong  monoconfidence  interval  given  Q.  This  is  found 
by  solving 


<L(c)  ) 


100-p 

200 


Gqc(U<c)) 


100+p 

200 


3.  Qualitative  Discussion  of  Polyconfidence  Intervals; 
Doubly-stronq  and  Weak  Instances 

Since  challenge  is  now  restricted  to  challenging  Q,  it 
is  both  natural  and  important  to  consider  at  least  several 
Q's.  For  a  qualitative  discussion,  we  do  not  need  to  say 
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just  how  many  we  consider,  so  let  { Q }  be  a  collection  of 
situations.  Notice,  that  we  may  as  well  call  a  collection 
of  situations  a  plot . 

Our  concern  is  then  with  kinds  of  polyconfidence  situa¬ 
tions  as  seen  in  the  light  of  a  specified  plot. 

*  doubly-strong  polyconf ioence  * 

The  strongest  requirement  we  could  make  is  that  of  a 
doubly-strong  polyconfidence  interval,  for  which  we  require 

Probg{  r  +  sL(c)  <  n  <  r  +  sU(c)/c} 

for  all  Q  and  c.  So  long  as  we  restrict  ourselves  to  Q's  in 
a  particular  plot,  we  can  hardly  ask  very  much  more  than 
this.  (Indeed,  it  may  well  be  that  we  are  asking  so  much 
that  we  would  be  willing  to  take  less.) 


Ke  can  try  to  ask  a  little  more,  however,  namely  bal¬ 
ance.  This  requires  that,  for  all  c  and  all  Q  in  the  plot 
I!,  both 


gqc(L(c)) 


and 
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G_nc  =  minlGQc(t)  IQ  in  111 
G+I!c  "  max (gQc  ( t.)  |Q  in  Ill 

then  a  balanced  doubly  strong  confidence  interval  is  one  for 
which 


G+nc(L(c)) 


< 


100-p 

200 


G-nc(u(c)) 


v  100+p 
-  200 


so  that  the  minimal  (tightest)  such  is  given  by 


L  (c) 


r-l  1 100-p | 

U+nd  200  | 


U  (c) 


.  „-l  1 100-p | 

"  -lie  I  200  | 


For  a  particular  p  and  IL  it  is  possible  that  there  are 

no  solutions  to  these  equations.  This  could  happen  if  the 

con-cons  G_  (t)  for  individual  Q's  differed  too  much.  (This 
Qc 

can't  happen  for  plots  containing  a  finite  number  of  Q's.) 


*  weak  polyconfidence  * 

A  doubly  strong  balanced  polyconfidence  interval  is 
immune  to  any  challenge  that  does  not  challenge  the  plot  fl. 
Accordingly,  it  will  tend  to  be  a  long  interval.  If  we  ask 
less,  we  might  shorten  it,  perhaps  considerably  on  average. 

Reducing  the  size  of  the  plot  would  surely  do  this,  but 
suppose  this  is  not  desired.  We  could  give  up  all  trace  of 
the  "strong"  requirement,  and  ask  only  that 
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ProbQ{r  +  sL  (c)  <+  p  <  r  +  sU(c)  * 

for  each  Q  in  II  and  on  the  average  over  its  c's.  This  would 
make  the  interval  (r  +  sL(c),  r  +  sU(c)  at  least  a  weak 
polyconfidence  interval.  There  are  such  intervals,  the 
sign-test-based  non-parametr ic  interval  for  the  median  being 
a  balanced  one.  Note  that  this  particular  weak  polyconfi¬ 
dence  interval  will  not  be  strong  for  the  same,  nominal 
value  of  p,  at  least  if  the  pure  Gaussian  situation  is  in 
the  plot.  This  is  so  because  preserving  conditional  proba¬ 
bilities  for  the  Gaussian  situation  requires  using  limits  of 
the  form 

y  +  t*  s* 

where 

<s*>2  =^s<*i  -  y>2 

Something  quite  different  from  limits  of  the  form 
ya  +  t(yb  ’  V 

There  are  many  weak  polyconfidence  intervals  and  a  dis¬ 
cussion  of  how  to  choose  one  could  indeed  be  lengthy.  We 
note  that  such  nonparametr ic  procedures  as  the  sign  test  and 
the  (one-sample)  Wilcoxon  test  offer  very  clearly  specified 
examples. 

4.  A  Compromise:  S ingly-Strong  Polyconfidence  I ntervals 


April  16,  1981 


*  ' 


10 


It  is  natural  to  believe  that: 

*  doubly-strong  polyconfidence  intervals  are  wastefully 
long , 

*  weak  polyconfidence  intervals  are  too  subject  to 
challenge,  in  terms  like  "look  at  that  configuration!", 

*  we  need  a  compromise,  where  there  is  at  least  a  reply 
to  such  challenges.  It  is  clear  what  one  such  comprom¬ 
ise  would  be. 

If  we  knew  two  things,  namely 
Probg{  r  +  sL  (c)  £  p  <  r  +  sU  (c)  }  al  1  Q  in  11 

and 

ProbQ*{r  +  sL (c)  <  p  <  r  +  sU(c)  I c }  all  c,  one  Q* 

Then  the  answer  to  "but  look  at  your  c"  could  be  "in  Q*  that 
wouldn't  matter".  This  deflects  the  challenge  from  the 
knowable  configuration  to  the  unknowable  situation.  For 
some  this  would  be  good  enough;  for  others  not.  (The  latter 
would  have  to  move  to  or  toward  a  doubly-strong  polyconfi¬ 
dence  interval.) 


*  criss-crossing  * 

If  Lj  (c) ,Uj (c)  defines  a  strong  (exact  or  conservative) 
monoconfidence  interval  for  Q*  (in  11)#  and  [^(c^UjCc) 
defines  a  weak  (exact  conservative)  polyconfidence  interval 
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over  IT#  then  L(c)  ,  U(c),  where 
L (c)  »  minU^  (c)  ,L2  (c)  } 

U(c)  »  maxfUj^  (c)  fU2(c)  } 

will  be  a  singly-strong  (conservative)  polyconfidence  inter¬ 
val  since 

Prob^fr  +  sL(c)  <  jj  <  r  +  sU(c)  |c} 

>.  ProbQ*{r  +  sL2  (c)  <  »  <  r  +  sU2  ( c)  I  c}  > 

for  all  c  and 

Prober  +  sL(c)  <  p  <  r  +  sU(c)  } 

>  ProbQ{r  +  sL2(c)  <  »  <  t  +  sU2(c)  }  > 
for  all  Q  in  n* 

*  curtailment  * 

Even  if  (Lj' (c)  ,U^  (c) )  is  not  the  same  as  (L2 (c) ,U2 (c) ) , 
we  may  have 

ProbQ*{r  +  sL2(c)  <  *i  <  r  +  sU2(c)  }  >  y^j 
for  some  c's.  For  such  c’s  we  can  surely  take 
L (c)  «  L2(c) 

U  (c)  »  U2(c) 

reserving  the  "min"  and  "max"  operations  for  where  they  are 
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really  needed.  Such  a  curtailed  criss-cross  is  easily 
implemented,  once  we  are  able  to  evaluate  the  probability 
above,  which  equals 


Vc(U2(c))  -  GQ*c(L2<c)) 

Now  we  have  started  economizing,  we  can  continue. 

*  patching  * 

How  can  we  tighten  our  polyconfidence  interval  further? 
One  easy  way  starts  with  L2(c)  and  U2(c),  and  looks  at 

Gq*c(L2(C))  and  GQ*c(U2(c)) 

If  these  differ  by  at  least  p/100,  we  are  content  with 

L  (c)  =  L2(c) 

U(c)  *  u2(c) 

for  all  such  c,  and  turn  our  attention  to  other  c's,  where 

GQ*c(U2(C,)  -  Vc(L2<C)I  <  TOO 

We  now  look  at  some  one  of  these  other  c's,  and  either 
decrease  L2(c)  or  Increase  U2(c)  until  either  starts  (on  L 
or  U)  showing  possible  changed  values 

GQ*c*U2*c^  “  Gq*c*L2*c^  “  TOO 

or 
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1  *  <WU2(c,) 


Q*c 


(L2(c) ) 


If  the  former  happens  first,  we  stop  there,  else  we  con¬ 
tinue,  preserving  the  last  equality  until  we  have 


Q*c 


(L*(c) ) 


100-p 

200 


GQ*c(U2(c)) 


100+p 

200 


Then,  relabelling  L2(c)  as  L(c),  and  U2(c)  as  we  haVe 

the  desired  values  of  L(c),  U(c)  for  that  particular  c. 


*  tuning  * 


We  are  not  prepared  to  offer  any  meaningful  comments 
about  the  optimality  of  such  a  L(c),  U(c)  pair  as  a  singly- 
strong  polyconfidence  interval.  There  is  a  very  real  possi¬ 
bility  that  we  can  do  better,  but  we  shall  avoid  this  tuning 
problem . 


*  examples  * 


Let  us  take  n*10,M  »  population  median, 
p  «  1002/1024  -98%,  and  II  *  all  reasonable  simple  situa¬ 
tions. 

With  our  conventions, 


<yl 


'  yn> 


ya_yl  ‘  yn'yb 

r  -  -  s,  r  +  s 


yH~y, 


yh-y. 


where 


April  16,  1981 


14 


L2(C) 


y*-yi  yr^i 

yb-ya  yb-ya 


U2(c) 


yn-yb 

yb-ya 


is  a  1002/1024  exact  weak  polyconfidence  interval  over  II. 
Similsrly,  for  Q*  *  Gsussisn, 


(y  -  t*s* ,  y  +  t*s*) 


where 

7  -  i  s  y. 

<s*»2  -*>2 

and  t*  is  the  upper  11/1024  percent  point  of  students  t  on 
19  degrees  of  freedom  (t*  ®  2.495),  is  a  strong  monoconfi¬ 
dence  intervai  at  Q*. 


The  criss-cross  L(c),  U(c),  easily  cslculsted  for  any 

conf igurstion ,  will  be  a  singly-strong  polyconfidence  inter- 

val  over  all  of  II.  Almost  certsinly  such  a  choice  will  be 

2 

quite  wasteful,  since  (s*)  is  eaisily  enlsrged  by  the 

extreme  vaiues  of  y.  or  y  .  This  will  be  only  slightly  less 

1 

true  if  we  curteil  the  criss-cross,  or,  probably  if  we 

ji 

repiace  it  by  a  pstching. 

It  might  well  be  reesonsble  to  study  the  results  of 

I 

*  using  a  sign-test  or  Wilcoxon  strong  polyconfidence 
intervais  for  the  medisn  as  L2  (c)  ,  U2(c) 
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4>  taking  0*  as  an  intermediate  situation,  say  a  simjle 
si  acu , 

♦  taking  (c) ,  (c)  as  the  strong  monoconfidence 

interval  on  Q*  based  on  the  w5-biwe ight; 

♦  considering  all  three  of  criss-cross,  curtailment, 
and  patching. 

5.  The  Position 

Ke  thus  have  at  least  three  qualities  of  polyconfidence 
interval s, 

♦  doubly-strong  (easy  to  find;  "wasteful"  to  some) 

strong  (allows  an  answer  to  the  c-chal  1  enge,  exampl  es 
easy,  tuning  not  likely  to  be  easy) 

<>  weak  (many  alternatives,  tuning  not  trivial,  but 
probably  feasible) 

each  of  which  may  or  may  not  further  include  the  requirement 
of  balance.  If  we  had  examples  of  all  of  these  for  one  or 
more  plots,  each  user  could  take  his/her  choice. 

Thus  we  ought  to  turn  to  the  question  of  finding  such 
interval s. 
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B 

THE  CALCULATIONS 


5.  The  Set-up 

We  look  toward  configural  polysampling  as  the  basis  for 
our  practical  work.  We  should  have  them: 

*  a  small  plot,  consisting  of  perhaps  2  to  7  situations 

St  S_ ,  S  _  for  each  of  which  a  center,  p,  is 

1,2  q  max 

specified . 

<b  a  polysample  of  configurations  { c >  =  c(l),  ...,  c(m) 

(note  that  each  c(i)  is  either  a  (n-2)-entry  vector 

array  or  a  n-entry  vector  array  with  two  fixed  entries) 

*  weights  W.  appropriate  for  use  when  c(j)  is  to  be 

J  T 

used  as  a  configuration  sampled  for  E^. 

*  con-con  functions,  G^c(t)  applicable  for  these  q' s 

a  nd  c 1  s  . 

The  character  and  building  of  all  these  pieces,  except 
the  con-con  function,  is  discussed  in  Technical  Report  185 
(Pregibon  and  Tukey,  1981)  and  Technical  Report  191  (Bell 
and  Pregibon,  1981)  (see  also  Rogers  and  Relies,  1973  for 
formulas  in  the  case  a=l ,  n*b)  .  The  basic  results  depend  on 
averages,  for  fixed  c,  represented  first  as  integrations 
over  r  and  s  and  then  on  integration  over  a  rectangle  (where 
Gaussian  quadrature  formulas  apply  each  way).  If  I  (r,s,t) 
is  the  indicator  function 
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IQc(r,st)  =1,  if  *i  <  ya  +  t(yfa  -  ya)  =  r  +  st 
=  0,  else 

which  depends  on  q  (where  Q  =  Sq)  and  c  then 

GQc(fc)  *  ProbQ{^i  <  r  +  st}  =  aveQc  {l(r,s..t)} 

and  this  can  be  evaluated  for  selected  values  of  t  in  a  way 
similar  to  the  other  integral  evaluations. 

7.  Doubly-strong  Polyconfidence  Intervals 

To  find  the  monimal  balanced  95%  polyconfidence  inter¬ 
val,  for  each  of  the  configurations  of  the  polysample,  we 
have  merely  to: 

*  evaluate  the  con-con  functions  G  (t)  for  each  q  in 

qc 

the  plot,  each  c  in  the  polysample,  and  well-selected 
values  of  t, 

*  calculate  the  +  and  -  con-con  functions  from 

G-nc(t)  *  min  <Gqc(t,/Sq  in  n) 

G+Hc(t)  *  m3X  (Gqc(t)/Sq  in  H) 
the  former  for  higher  values  of  t,  the  latter  for  lower 

values  of  t, 

*  solve  the  equations 

G+nc=2-5% 

G-nc(t)“97*5% 

for  each  c  in  the  polysample. 
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O  take  these  values  of  t  as  L(c)  and  U(c) 

t  assert,  for  these  configurations  (and  any  others  to 
be  treated  later) ,  that  the  interval  in  question  is 
from 

r  +  sL  (c)  =  ya  +  L(c)  (yfa  -  ya) 
to 

r  +  sU  (c)  *  ya  +  U  (c)  (yfa  -  ya) 

If  we  want  to  know  how  this  polyconfidence  intervals 
performs,  we  average  what  we  see  at  the  given  polysamples 
using  the  appropriate  weights.  Thus  the  average  lengths  of 
our  confidence  intervals,  which  we  do  NOT  think  is  likely  to 
be  a  good  criterion  to  consider  would  be  found  as  an  esti¬ 
mate  of 
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Thus  we  need  not  expect  too  much  difficulty  (a)  in 
finding  L(c)  and  U(c)  and  ( b)  in  evaluating  simple  proper¬ 
ties. 

8.  Tuning  Weak  Monoconfidence  Intervals  for  Average  Lengths 

Before  we  attempt  to  tune  more  complicated  structures, 
it  is  well  to  consider  tuning  weak  monoconfidence  intervals 
using  the  same  dubious  criterion  of  average  length.  Here  we 
wish  to  choose  L(c)  and  U(c)  to  satisfy 

7  (=0c<u(c>>  -  SQCIL|C’)>  >  tiff 
while  making 

0  f  Q  _  Q 

ave  {(U(c)  -  L(c))sQC)  -  ave  {U(c)sgcJ  -  ave{L  (c)IQCJ 

Q 

small.  (Note  that  "ave"  means  what  might  also  be  written 
"aveq"  indicating  averaging  over  the  indicated  part  of  a 
selected  instance  of  the  situation;  averaging  that  would 
include  explicit  use  of  weights,  were  this  necessary,  —  as 
it  will  be  in  the  polysampling  case.)  If  we  have,  say,  500 
configurations  on  which  we  are  working,  we  have  a  con¬ 
strained  optimum  problem  with  1000  variables.  Direct 
approaches  are  likely  to  be  inefficient. 

We  will  ordinarily  find  GQc(t)  ogive-shaped,  and  its 
derivative,  9gc(t),  unimodal.  We  will  shortly  have  occasion 
to  be  concerned  with  two  inverses  of  g^c ( t )  which  we  can 
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define,  in  general 

hQC(U):  the 
hJc(u):  the 


/  by 

algebraically  smallest  t  with  g^c(t) 
algebraically  largest  t  with  g^Ct) 


=  u, 

] 

-  u. 


So  long  at  least  as  is  continuous,  these  will  be 

well-defined  on  0  <  u  <  max  (though  they  might  be 

discontinuous,  since  <3qc()  might  not  be  unimodal).  If  we 
write 


SbQc(U(d) 


5WL(cl> 


SbQcsQcU(C> 


5b0cs0cL|cl 


where  the  b^  incorporate  the  needed  weights,  if  any,  for 
the  4  sums  that  we  will  use  to  replace  the  averages  we  used 
to  state  the  problem,  we  will  want  to 


minimize  S3  “ 

subject  to  *i  -  S2 

which  is  naturally  attacked  with  a  Lagrange  multiplier,  x, 
by  minimizing 

S3  -  S^  -  ^2) 

and  then  choosing  to  satisfy  the  constraint.  Since  S3  and 
Sj  are  functions  of  the  {U (c) }  above  —  and  S2  and  ^4  of  the 
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{L(c)}  —  we  can  extremelize 

53  "  'XSl  and  *4  -  ,\S2 

separately  (unless  this  leads  to  some  U(c)  <  the  correspond 
ing  L(c) .)  This  leads  to  differentiating  wirt.  U(c),  to 

bQc®Qc  *  *\bQC9Qc(U(c))  *  0 
whence  we  may  take 

U(c)  -  hSe<\e/» 

similarly,  we  may  take 
"  bQC  ^sQc^^ 

what  remains  is  to  empirically  choose  ,\  to  make 
Q 

ave  {GQc(U(c))  -  GgC  (L  (c)  )  } 

equal  to  the  desired  p/100. 

9-  Other  Criteria 

We  also  want  to  be  able  to  tune  our  monoconfidence 

intervals  for  other  criteria,  which  deserve  some  discussion 
in  their  own  right. 

We  all  recognize  some  form  of  confidence  interval  as 
the  best  we  can  do.  And  few  of  us  want  a  99.9999%  interval. 
Thus  we  have  come  to  accept  a  meaningful  but  small  (say  5% 
or  1%)  chance  that  our  confidence  interval  will  not  cover 
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the  center  at  which  it  is  aimed.  Should  we  then  pay  too 
much  attention  to  a  small  chance  that  the  interval  is  very 
long? 

When  the  interval  is  too  long,  it  is  unhelpful,  but  we 
know  it.  Can  this  be  nearly  as  important  as  missing  its 
target?  It  would  seem  that  the  answer  should  be  a  rousing 
"no"!  If  we  are  going  to  allow  5%  misses,  then  we  ought,  we 
very  well  argue,  accept  10%  overlong  intervals.  So  we  seek 
related  criteria. 

Two  natural,  but  possibly  naive,  choices  are: 

*  the  90%  point  of  the  length  of  the  confidence  inter¬ 
vals,  and 

*  the  average  length  of  the  shortest  90%  of  all  confi¬ 
dence  intervals. 

If  we  are  doing  direct  sampling,  either  of  these  can  be  used 
simply  and  directly,  just  by  sorting  the  empirical  interval 
lengths. 

If  we  are  working  with  configurations,  we  have  to  add  a 
loop.  For  what  can  be  reasonably  calculated  at  a  configura¬ 
tion  is  the  chance  that  an  interval  —  cr  a  configurations 
—  should  be  shorter  than  a  prescribed  length.  If  we 
prescribe  a  length,  moderately  extensive  computation  gives 
us  a  %  less  than  this  length.  Then  we  have  to  adjust  the 
length,  and  iterate.  If  we  must,  we  must. 
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But  let  us  think  about  our  criteria  with  a  little  more 
care.  Perhaps  the  last  two  assess,  separately,  two  aspects 

that  we  might  like  to  assess  together.  Suppose  that  we  took 

9  1 

Yq  ave  {90%  shortest  lengths}  +  (90%  point) 

This  is  the  average  of  a  saturating  function 

*  length,  below  the  bend  at  the  90%  length 

*  90%  length,  else 

There  are  possible  virtues  to  such  a  combination.  Let  K  be 
a  trial  value  for  the  90%  length,  then  we  would  estimate 

ave  { length  I  length  £  K} 

Prob  { length  £  K } 

which  is  equivalent  to  estimating 

ave  {K  -  length  I  length  £  K } 

Prob  { length  £  K } 

where  we  still  must  iterate  on  K  to  make  the  probability  * 
to  p/100.  The  criterion  will  then  take  the  value 

K  -  ave{K  -  length!  length  £  K}  ■  K  -  K  ave{l  -  I  length  £  K} 

where  the  last  factor  is  a  relatively  slowly  changing  func¬ 
tion  of  K. 

10.  I teratinq  for  the  Select  Criterion 
Let  us  put 


April  16,  1981 


24 


s*  I 


aqc(u)  “  ave| (1  "  u5 I  given  s  <  uj 


bqc (u)  *  prob  {s  <  u} 


r,s 


so  that  the  probability  that  the  interval  length  be  less 
than  K  is,  for  one  c. 


B 


Qc  |U(c)  -  L  (c)  I 


so  that  we  must  eventually  control 


Sb —  B 


! 


Qc  dQc  lu  (c)  -  L  (c)  I 

by  changing  K.  For  K  fixed,  however,  we  desire  to  minimize 
K—  (K  times  the  following)  and  hence,  for  fixed  K,  to 
maximize 


Sb„_  A 


I 


since 


Qc  "Qc  |U (c)  -  L(c) | 


1  length  m  .  (U  (c)  -  L  (c)  )  s  =  .  _ s _ 

K  1  K  K/(U  (c)  -  L  (c)  ) 


The  Lagrangiar  form  to  be  extremal  is  now 

SbQc  AQc  iU  (c)  -  L  (c)  !  "  (SbQCGQC  (U  (c)  5 
whose  derivatives  wirt  U(c),  and  L(c)  with 


Sb0e°CC<L<e>> 


a0c<u)  ■  SI  »QC  (U) 
are,  less  the  common  factor  b_„ 

QC 
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-  a 


*  I  . 

Qc  |  U  (c)  -L  (c)  I 


K 


(U(c)-L(c)) 


2  - 


and 


+  a 


Qc  |U  (c)-L  (c)  | 


K 


(U  (c)-L(c)  ) 


2  -  >90c(t(e)) 


If  these  vanish,  so  does  their  difference,  hence 


gQc  (U  (c)  )  *  9Qc<Mc)) 
as  tacitly  before,  giving 

u(c)  -  bQC  (gQc  (l  to ) 

and 


U  (c)  -  L(c)  -  AQC(gQc(9Qc(I*(c)  )  -  L(c)  -  fQ(,(L(c)) 


so  that 


i^gQc(L(c) ) 


Qc 


K 


fQc(L(c)) 


K 


(CQC(L(C>  ) 


which  should  be  soluble  for  L(c)  ,  given  possibly  with  some 
effort.  Once  this  is  done  for  all  c  in  our  sample  of  confi¬ 
gurations,  we  will  again  want  to  check 


Prob  { length  <  K  } 


B  I  K  I 
QclU  (c)  -  L(c)  | 


and  adjust  K  to  bring  this  to  the  desired  value. 


The  process  Is  appreciably  more  complicated  than  for 
the  average  length  criterion,  but  apparently  not  unbearably 
so . 
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Tuning  Weak  Polyconf idence  Intervals 


Suppose  we  want  a  weak  polyconfidence  interval  for  a 
plot  consisting  of  two  situations,  A  and  Z.  We  now  have,  if 
we  stick  to  the  simple  case  of  the  average  length  criterion, 
eight  5's  to  consider: 


‘A1 


A  2 


■■zi 


Z  2 


A3 


‘A  4 


Z  3 


Z  4 


SbAcGAC<U(b>> 

JbAcGAC  "•<'>> 

5bZcGZC(U(C)> 

SbAc(L(0() 

SbAc  V"  <C> 

AC 

SbAcVL<C) 

AC 

5b  s  U  (c) 

Zc  TZ 

5b  s  L  (c) 

Zc  Zc 


and  we  desire  to  minimize,  jointly, 


and 


subject  to 


April  16,  1981 


-  27  - 

5ai  -  5A2  >  e'100 

hi  -  hi  ±  p/100 

This  clearly  calls  for  one  pair  (d»§)  of  shadow  prices  and 
one  pair  of  Lagrange  multipliers,  all  of  which  leave  us 
minimizing 

**^A3  "  5A4*  +  ^(SZ3  "  ZZ4J  "  ‘\(SAl  *  5A25  "  *^Z(SZl  “  5Z2) 

which  can  again  be  done  separately  for  two  parts,  here 
minimizing 

^A3  +  ^SZ3  «\5A1  '  »^Z5Z1 
and  maximizing 

^SA4  +  ^SZ4  ”  »\5A2  '  »\5Z2 

which  lead,  on  differentiating  w.r.t.  U(c)  and  L(c), 
respectively,  to 

0  ■  C<bAciAc  +  S»jc*Ac  -  •VbAc9Ac(l(c))  *  •NZbZc5Zc<U<C” 

and 

0  "  pbAc^Ac  +  ^Zc^Ac  -  <Vac’ac(L<C,)  '  >''zbZc5zc(L(C” 

If  we  now  write 
^A  * 

#\z  -  #\d-e) 
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both  right-hand  pairs  of  terms  can  be  written  in  terms  of 

0bAcgAc(t)  +  (1“0)  bZcgBc *  h0c(t) 
whose  inverses  can  be  written  as 

hec<u)  and  h~c  (u) 
so  that  we  have 

0  *  «bAcbAc  +  Sbzc5ze  -  ^ec<u<=>  > 

0  -  «bAc*AC  +  *bzcHc  -  'Vheo  (L(C)  1 

and 

u(o  *  h;c 

L(C)  =  h~c 

where  0  and  Jy  must  now  be  varied  jointly  to  ensure 

JbAcGAC<U<C1>  -  5bAcGAc<L(C))  i  P/10° 

5bZcGZc(U<c))  '  SbZcGZca<Cl)  i  P/10° 

Outside  of  this  loop,  we  must  vary  c(/$  (we  will  simplify 
matters  by  forcing,  say,  c(  +  $  *  2)  in  order  to  get  the 
right  joint  minimum  for  the  two  average  lengths. 

Plausibly  what  we  may  seek  at  this  point  is 


«bAcSAc  +  ^bZcSZc 
- 

«bAc=Ac  +  ^bZc=ZC 

- S - 
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A  Z 

ave{ polyconfidence  length}  _  ave{polyconfidence  lenqths} 

A  A  Z  Z 

min  ave{monoconf idence  length}  min  ave{monoconf idence  length} 

thus  maximizing  a  polyefficiency  defined  by 


polyefficiency  *  min{monoeff iciencies} 


A  -  monoefficiency 


lA-minimum  average  lengthl 2 
I  actual  minimum  length  I 


Z  -  monoefficiency 


iZ-minimum  average  lengthl 2 
I  actual  minimum  length  I 


12.  Comment 

The  calculations  are  clearly  getting  moderately  com¬ 
plex.  They  will  get  somewhat  worse  for  other  criteria  or 
more  situations.  But  they  seem  likely  to  be  feasible. 

13.  Tuning  Singly-strong  Confidence  Intervals 

If  we  have  also  fixed  a  situation  Q*,  at  which  we  want 
our  polyconfidence  internal  to  be  a  strong  monoconfidence 
interval,  what  we  have  done  is  to  require 

GQc(U(c))  -  GQc(L(c))  >  p/100 
for  each  c. 

We  expect  the  usual  situation.  For  some  c  (given,  say, 
«/$,  and  0  )  this  condition  will  be  satisfied,  for  others 
not.  For  the  others  we  must  involve  Q  in  the  choice  of  U(c) 
and  L (c) . 
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We  can  easily  parametrize  the  possibilities  in  terms  of 
L(c).  Given  a  sufficiently  small  L(c)  (for  given 
«</  ^  ,  and  0  )  ,  there  will  be: 

*  the  smallest  U(c)  that  satisfies  the  condition 
displayed  above, 

*  a  minimum  cost  U(c),  depending  upon  and  theta. 

consider  the  larger  of  these,  and  its  total  cost,  at  c  of 
the  (L(c),U(c)  pair,  as  given  in  terms  of  c(,  §  ,  and  0. 

Now  choose  L(c)  to  minimize  this  cost  and  U(c)  to  be  the 
corresponding  larger  value. 

When  this  has  been  done  for  all  c,  we  are  ready  to  vary 
<i,  §  and  0  to  obtain  the  desired  result. 

Again  the  process  has  become  somewhat  more  complicated, 
but  is  probably  still  feasible. 
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